-
Notifications
You must be signed in to change notification settings - Fork 26.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Idefics 3! #32473
Add Idefics 3! #32473
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just have a few comments regarding consistency with other VLMs like Chameleon + nits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yaaay, Idefics3 is here!
I just left a few comments to make the ongoing work on unifying a bit VLMs easier for us, but didn't really review the whole PR
past_seen_tokens = 0 | ||
return_legacy_cache = False | ||
if use_cache: | ||
if not isinstance(past_key_values, Cache): # kept for BC (non `Cache` `past_key_values` inputs) | ||
return_legacy_cache = True | ||
past_key_values = DynamicCache.from_legacy_cache(past_key_values) | ||
past_seen_tokens = past_key_values.get_seq_length() | ||
|
||
if inputs_embeds is not None and input_ids is None and past_seen_tokens == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have to support old-style cache for new models, can go directly with DynamicCache. Finally deprecated tuple cache in all decoder-only models, yay :)
And btw, past-cache-length should be obtained from cache_position
, afaik we'll stop using the get_seq_length()
some time in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! I removed the support for the old-style cache.
But I tried to use cache_position
and got this error:
AttributeError: 'DynamicCache' object has no attribute 'cache_position'
So I reverted to get_seq_length()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because past_key_values
can be None, it results in an exception if we load the model without specifying use_case=False. E.g.,
Idefics3ForConditionalGeneration.from_pretrained(base_model_id, use_cache=False)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take care of cache-related changes after the PR is merged, as part of work going on "new-cache compatibility"
And yes, past_kv can be None, but the logic with get_seq_length()
should work as long as we check for Noneness
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work adding this model!
The main comment is for the tests to be properly aligned with the new model behaviours, in particular the processor and image processor.
Some general nits for the modeling file - the main one being all classes and method which come from idefics2 should have # Copied from
comments.
Before the PR can be merged, all slow tests will need to be run & pass. These should be triggered in subsequent commits (I might need to approve the workflow for them to run)
if isinstance(image, Image.Image): | ||
width, height = image.size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The images should never be Image.Image here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left this here as it is a custom transforms for idefics3. If the images are passed as PIL objects, we don't convert them to numpy arrays until later in the processing pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also raising a warning once if the input is not PIL images. But it will still work for numpy arrays or other types of inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the warning as now it works perfectly with numpy arrays :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still should be removed - we shouldn't be processing PIL images
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed!
if isinstance(image, Image.Image): | ||
cropped_image = image.crop((start_x, start_y, end_x, end_y)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here - there shouldn't be any PIL code for the transformations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left this here as it is a custom transforms for idefics3. If the images are passed as PIL objects, we don't convert them to numpy arrays until later in the processing pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After further discussion with Amy, I added a large change to support processing the images as numpy arrays. Here: 5a0c0f4
1c10a48
to
5a0c0f4
Compare
f67ed1e
to
e71711d
Compare
updated main |
0f7a8e6
to
4756044
Compare
@andimarafioti I can see that you re-requested review, but there's still some debugging commits being pushed so will hold of reviewing until this has been resolved. I'm going to unsubscribe to prevent getting notifications for every push - as soon as you have a question or want to let me know it's ready, just ping me with |
ef576cb
to
483d5d8
Compare
@amyeroberts ready to review! There is still the multi-gpu tests that is queued but if those fail I would skip them. They run OOM in the CI on single GPU and there is an issue open for the same on idefics2 #32288. If my fix here works, I already opened a PR for that fix as well: #32840 |
Talked to @molbap and he said that there is an issue with the multi-gpu workers getting stuck in the queue. But it's not related to this PR. |
@andimarafioti Yes, unfortunately we're having issues at the moment with single GPU and multi-GPU runners taking a long time to run / never running cc @ydshieh. It looks like the multi GPU tests did eventually run and there's at least one test which is currently failing to be addressed |
Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
I rebased on main |
7ee2ec8
to
ee041bf
Compare
1bbf7ba
to
39d88b2
Compare
|
||
# Copied from transformers.models.idefics2.modeling_idefics2.Idefics2PreTrainedModel._init_weights | ||
def _init_weights(self, module): | ||
std = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it is, this always assigns self.config.text_config.initializer_range
while, from what I understand, it should assign self.config.initializer_range
in case hasattr(self.config, "initializer_range")
. Is it possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you're right. This seems to also be a mistake on idefics2. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great - I think we're good to go!
* Add Idefics 3! * fixes to make both pipelines identical * fix for quantized models * First pass at the review * remove vocab size from the main config (it's still in the text_config) * hot fix for merve * Apply suggestions from code review Co-authored-by: amyeroberts <[email protected]> * re-add model_type for text_config * remove support for old_cache * remove hidden_size from main config * rename idefics3 HF repo * few changes suggested in the PR * fix to input_data_format computation * remove overwrite of _autoset_attn_implementation following @zucchini-nlp suggestion * improve example * few improvements from amy's review * big change to enable processing input images as numpy arrays * Changes to the code to uniformize processor kwargs * image processing tests * image processing tests fixes and some bugs they discovered * addressed review comments from Yoni * fix modeling tests * remove special tokens that are not special * fixes tests * skip failing tests - they also fail for idefics2 * added paper and readded the tests with multi gpu, who knows * Update docs/source/en/model_doc/idefics3.md Co-authored-by: amyeroberts <[email protected]> * Apply suggestions from code review Co-authored-by: amyeroberts <[email protected]> * review amy until image_processing_idefics3 * last comments from Amy * review amy * Update src/transformers/models/idefics3/image_processing_idefics3.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/idefics3/modeling_idefics3.py Co-authored-by: amyeroberts <[email protected]> * Update docs/source/en/model_doc/idefics3.md Co-authored-by: amyeroberts <[email protected]> * doc improvement - amy review * fix runtime error during fine-tuning * amy's review * Update src/transformers/models/idefics3/image_processing_idefics3.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/idefics3/image_processing_idefics3.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/idefics3/modeling_idefics3.py Co-authored-by: amyeroberts <[email protected]> * ruff * amy's comment on the order * ruff ruff * fix copies * square images when they are not splitted * ruff :( * Update src/transformers/models/idefics3/image_processing_idefics3.py Co-authored-by: amyeroberts <[email protected]> * Update tests/models/idefics3/test_processing_idefics3.py Co-authored-by: amyeroberts <[email protected]> * fix small bug introduced in refactor * amy's image processing changes * fixes peft tests and ruff * modify to_pil_image from transformers. and review from emanuele. * add modified to_pil_image --------- Co-authored-by: amyeroberts <[email protected]>
What does this PR do?
Adding the Idefics 3 model.
There are still a few things to do before merging this PR. The results are not exactly the same as with our codebase and the tests are not done. We are opening it to unblock our release.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Models: